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Abstract 

We consider the detection of activations over graplis under Gaussian noise, wliere signals are piece-wise constant 
over the graph. Despite the wide apphcabihty of such a detection algorithm, there has been little success in the de- 
velopment of computationally feasible methods with prove-able theoretical guarantees for general graph topologies. 
We cast this as a hypothesis testing problem, and first provide a universal necessary condition for asymptotic distin- 
guishability of the null and alternative hypotheses. We then introduce the spanning tree wavelet basis over graphs, 
a localized basis that reflects the topology of the graph, and prove that for any spanning tree, this approach can dis- 
tinguish null from alternative in a low signal-to-noise regime. Lastly, we improve on this result and show that using 
the uniform spanning tree in the basis construction yields a randomized test with stronger theoretical guarantees that 
in many cases matches our necessary conditions. Specifically, we obtain near-optimal performance in edge transitive 
graphs, fc-nearest neighbor graphs, and e-graphs. 

1 Introduction 

This paper focuses on the problem of detecting activations over a graph when observations are corrupted by noise. The 
problem of detecting graph-structured activations is relevant to many applications including identifying congestion in 
router and road networks, eliciting preferences in social networks, and detecting viruses in human and computer 
networks. Furthermore, these applications require that the method is scalable to large graphs. Luckily, computer 
science boasts a plethora of efficient graph based algorithms that we can adapt to the detection framework. 

1.1 Contributions 

In this paper, we will be testing if there is a non-zero piece-wise constant activation pattern on the graph given ob- 
servations that are corrupted by Gaussian white noise. We show that correctly distinguishing the null and alternative 
hypotheses is impossible if the signal-to-noise ratio does not grow quickly with respect to the allowable number of 
discontinuities in the activation pattern (Section 2). Since a test based on the scan statistic which matches the obser- 
vations with all possible activation patterns by brute force is infeasible, we propose a Haar wavelet basis construction 
for general graphs, which is formed by hierarchically dividing a spanning tree of the graph (Section 3). We find that 
the size and power of the test can be bounded in terms of the number of signal discontinuities and the spanning tree, 
immediately giving us a result for any spanning tree. We then propose choosing a spanning tree uniformly at random 
(this can be done efficiently), and show that this bound can be improved by a factor of the average effective resistance 
of the edges across which the signal is non-constant (Section 4). With this machinery in place we are able to show 
that for edge transitive graphs, such as lattices, fc-nearest neighbor graphs, and e geometric random graphs, our test is 
nearly-optimal in that the upper bounds match the fundamental limits of detection up to logarithm factors (Section 5). 
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1.2 Problem Setup 



Consider an undirected graph G defined by a set of vertices V (|y | = ?i) and undirected edges = m) which are 

unordered pairs of vertices. Throughout this study we will assume that the graph G is known. The statistical setting 
that we will address is the normal means model, 

y = X + e 

where x, G MX , e ~ A^(0, a^Iy), and <t^ is known. Specifically, we assume that there are parameters p, ji (possibly 
dependent on 71) such that 

A- = {x € : \{{v,w) €E:x^^ x^}\ < p, \\^\\ > p} 

defines the class of possible x. Hence, the possible signals have few edges across which the values of x differ. In 
graph-structured activation detection we are concerned with statistically testing the null and alternative hypotheses. 



Hi :y-iV(x,o-2l),xG A" 



(1) 



Hq represents business as usual while Hi encompasses all of the foreseeable anomalous activity. Let a test be a 
mapping T{y) £ {0, 1}, where 1 indicates that we reject the null. 

It is imperative that we control both the probability of false alarm, and the false acceptance of the null. To this end, 
we define our measure of risk to be 

R{T) = Eo[r] + sup Ex[l - T] 

N{x, a^T). The test T may be randomized, in which case the 



where Ex denote the expectation with respect to y 

risk is ¥,tR{T). Notice that if the distribution of the random test T is ind ependent of x, then Et supxg;^' Ex[l 



.Ex[l-T] 

sup^gy Er.xfl ~ ^1- This is the s etting of lArias-Castro et al.l 1201 ill which we should contrast to the Bayesian 



setup m lAdd ario-Berrv et al.l II2OIOII . We will say that Ho and Hi are asymptotically distinguished by a test, T, 
if lim„_i.oo R{T) = 0. If such a test exists then Hq and Hi are asymptotically distinguished, otherwise they are 
asymptotically indistinguishable. 

To aid us in our study we introduce some mathematical terminology. Let the edge-incidence matrix of G be 
V € M^^^ such that for {v, w) £ E, V(t,,tu) .„ = 1, V(.„ = — 1 (the order of (w, w) is chosen arbitrarily) and is 
elsewhere. For a vector, w G M^, supp(w) = {v G V : w 7^ 0} and |jw||o = |supp(w)|, so ||Vx||o < p for all 
X G A'. We will be constructing spanning trees T of the graph G, which are connected subsets of E with no cycles. 
Furthermore, we will denote the edge-incidence matrix of T as Vr- 



1.3 Related Work 

The statistical problem that we are addressing can be broadly classified as a high-dimensional Gaussian goodness-of-fit 
test. This is a well studied pro blem when the s tructure of Hi derives from a smooth function space such as an ellipsoid, 
Besov space or Sobolev space llngsteii lll987ll . llngster and Suslinal ll2003ll . The function space X t hat we are proposing 



is combinatorial in nature. This statistical problem has only recently been studied theoreticallv lAddario-Berrv et al 



ll20I0ll . lArias-Castro et al.l 0201 III , although to the best of our knowledge none have addressed the problem under 
arbitrary gr aph structure. More broadly, this work falls under the purview of multiple hypot hesis testing, which has a 
rich history Benjamini and Hochberg 1 1995 1. Unfortunately, aside from a few special cases Hall and Jiril I 2010ll . the 
multiple tests are assumed to be independent, making any such work not applicable to our setting. 

In this paper, we evaluate our method by it's ability to distinguish Hq from Hi, however the procedure is based 
on constructing a wavelet basis over graphs which is relevant for other problems such as denoising and compression. 
Wavelets are multi-resolution bases that can represent inhomogeneous signals efficiently using a few non-zero wavelet 
coefficients which makes them attractive for denoising, compression and detection . As a result, they have been used 
extensively in mathematics, signal processing, statistics and ph ysics Mallall I I999ll. They hav e also been used wit h 
great success in statistics, with extensive theoretical guarantees iDonoho and Johnstone! lll995ll . iHardle et al.l lll998ll . 
Vidakovic il999ll . Recently there has been some attention paid to developing wavelets for graphs. Unfortunate l y, mos t 
of these have eith er focused on graphs with a known hierarchical structure iGavish et al. ll20I0ll . lRamet al.l 11201111 . 
Singh et al. I i20Iflll . or do not come with approximation or spa rsifying properties that can be used for our class of 
graph functions X Hammond et alJ 1201 ill , ICoifman and Maggioni 1,20061 . 
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2 Universal Lower Bound 



In order to more completely understand the problem of detecting anomalous activity in graphs, we prove that there is 
a universal minimum sign al strength under which Hp and Hi are asymptotically indistinguishable. The proof is based 
on a lemma developed in lArias-Castro et al.l 120081, but the strategic use of this lemma is novel. Our construction 
of the 'worst case' prior gives a significantly tighter bound than would a more naive implementation. Indeed, it is 
interesting to note that the worst case prior is a uniform distribution of the largest unstructured signals that we are 
allowed in Hi that are nearly disjoint. 

Theorem 1. Let the maximum degree of G be dmax- Consider the alternative, Hi, in which the cut size of each signal 
in X is bounded by p, with lim„_>.oo p = oo and p < dn. Hq and Hi are asymptotically indistinguishable if 



- = o ( ./min{— ^ 

Cr V V "ma 



,Vn} 



Proof. We begin by cons tructing a prior distribution over X. This portion of the proof derives from the a nalysis in 
Arias-Castro et al.l Il2008ll and closely mirrors that of lAddario-Berrv et al. II2OI oil . I Arias-Castro et al.l 1I2OI ill . We will 
suppose that we have some subset S C 2^ such that we will draw an S € S uniformly at random. Then the signal is 



constructed X 



I5 giving us a prior distribution tt over X. Call the Bayes risk R* 



Lemma 2. \Addario-Berrv et al. Let S and S' be drawn uniformly at random from S. Then the Bayes risk R* 

is bounded by 



Ecxp 



2 isns' 



- 1 



Hence, if Eexp 



and construct S to be all subsets of V of size p. Then 



sng'l 



1, then Hq and Hi are asymptotically indistinguishable. Let p = [min{p/(iniax, 



Eexp 



\sns'\ 



Lcxp 



2pa' 



isns'l 



Let {zi}^^^ be Bernoulli trials with success probability p/n. We see that the distribution of \S O S'\ is invariant 
under conditioning on S' and then it is equivalent to sampling without replacement from a population in which there 
are p successes. Bv Theorem 4 in Hoeffdin j 1 1963 1 we know that for i > 0, Ee*'^'^'^ ' < Ee*^i^=i^'. Let i — 



by the generating function of Bernoulli random variables. 



By the assumption ^ — o{p) so for any c > for n large enough 



< 11 + c^Y < 1 1 



because p < ^Jn. Hence, Ee*'"^'^'^ ' ^ 1. All that remains is to notice that the cut sizes of S" e 5 are bounded by p 
because the cut sizes are bounded by pd^ax < P- CH 



3 Spanning Tree Wavelets 

In this section, we present an algorithm for constructing a wavelet basis given a spanning tree and we characterize its 
performance for the detection problem ([1}. 

Informally, we would like to construct a basis B for which each edge e e T is activated by very few basis 
elements, where we say that an edge e is activated by element b if e G supp(Vrb). As we will show, upper bounding 
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the number of basis elements that activate any edge will be essential in analyzing the performance of our estimator 
l|By||oo. 

We construct our wavelet basis B recursively, by first finding a seed vertex in the spanning tree such that the sub- 
trees adjacent to the seed have at most [n/2] vertices and then by including basis elements localized on these subtrees 
in B. We recurse on each subtree, adding higher-resolution elements to our basis, and consequently constructing a 
complete wavelet basis. The first phase of the algorithm ensures that the depth of the recursion is at most [log n] and 
the second ensures that each edge is activated by at most [log d] basis elements per recursive call. Combining these 
two shows that each edge is activated by at most [log d] [log n] basis elements. 

Finding a balancing vertex in the tree parallels the technique in iPearl and Tarsil lll986ll . which finds a balancing 
edge. The algorithm starts from any vertex v £ T and moves along T to a neighboring vertex w that lies in the largest 
connected component of T \ v. The algorithm repeats this process (moving from v to w) until the largest connected 
component of T \ w is larger than the largest connected component of T\v at which point it returns v. We call this 
the FindBalance algorithm. 

Once we have a balancing vertex v, we form wavelets that are constant over the connected components of T\v such 
that any vertex is supported by at most log d wavelets. Let be the degree of the balancing vertex v and let ci , . . . c^^ 
be the connected components of T\v (with v added to the smallest component). Our algorithm acts as if ci, . . . c^^ 
form a chain structure and constructs the Haar wavelet basis over them. We call this algorithm FormWavelets: 

1. Let Ci = U,<d^,/2Cj and C2 = ^t>dj2 

2. Form the following basis element and add it to B; 



V\Ci\ + \C: 



21 



1 1 

rlCi " 7777 



\Ci\ \G. 
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3. Recurse at (1) with the subcomponents of Ci and C2 with partitions {ci}j<p/2 and {ci}iypf2 respectively. 

Our algorithm recursively constructs basis elements using the FindBalance and FormWavelets routines on subtrees 
of T. We initialize T to be a spanning tree of the graph and start with no elements in our basis. 

1 . Let V be the output of FindBalance applied to T. 

2. Let 7i , .., 7d„ be the connected components of T\v and add v to the smallest component. 

3. Add the basis elements constructed in FormWavelets when applied to Ti, ■■■,Td^, 

4. For each i E [dy], recursively apply (1) - (4) on % as long as |7i | > 2. 

As we will see, controlling the sparsity, | |Bx| |o is essential in analyzing the performance of the estimator ||By||oo. 
The main theoretical guarantee of our basis construction algorithm is that signals with small cuts in G are sparse in B. 
Specifically, we prove the following key lemma in the appendix: 

Lemma 3. Let V be the incidence matrix ofG and V7- be the incidence matrix ofT (where T has degree at most d). 
Then 1 1 Vx| |o is the cut size of pattern x g K^^*^). Then for any x e M^C^', 

||Bx||o < ||Vrx||o[logdl[lognl < ||Vx||o[logdl [lognl (2) 

Equipped with Lemma|3]we can now characterize the performance of the estimator ||By||oo on any signal x. Our 
bound depends on the choice of spanning tree T, specifically via the quantity ||V7-x||o, the cut size of x in T. The 
proof of the following can be found in the appendix. 



Theorem 4. Perform the test in which we reject the null if ||By||oo > t. Set r = a^2\og{n/ S). If 

^ > V2||Vrx||o[logdl [log7i](v/log(l/5) + v/log(n/<5)) (3) 
a 

then under Ha, ¥{Reject} < d, and under Hi, ¥{Reject} > 1 — d. 

Remarks. For any tree we have ||V7-x||o < \\Vx.\\q for all patterns x, so that for the sparse cut alternative we can 
have both Type I and Type II errors < S as long as: 

^ > V2p[logdl[lognl(Vlog(l/5) + Vlog(n/5)) (4) 
a 
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4 Uniform Spanning Tree Basis 



The uniform spanning tree (UST) is a spanning tree generation technique that we will use to construct wavelet bases. 
We will first examine the deep connection between electrical networks, USTs and random walks. Because the UST 
is randomly generated, the test statistic, ||B7-y|| when conditioned on y will also be random. Due to results from cut 
sparsification, we can relate the performance of the UST wavelet detector to effective resistances. 



4.1 Cuts and Effective Resistance 

Effective resistances have been extensively studied in electrical network theory. We define the combinatorial Laplacian 
of G to be A = V. A potential difference is any z e such that it satisfies Kirchoff's potential law. the total 
potential difference around any cycle is 0. Algebraically, this means that 3x e such that Vx = z. The Dirichlet 
Principle states that any solution to the following program gives an absolute potential x that satisfies Kirchoff's 
potential law: 

min. x^Ax s.t. xg = V5 

for source/sinks S ClV and some voltage constraints V5 e M.^ . The realized objective x^Ax is known as the total 
energy of the system. By Lagrangian calculus, the solution to the above program is given by x = AW where f 
indicates the Moore-Penrose pseudoinverse. The effective resistance is the total energy of a system in which v,w €V 
are the source and sink respectively and a unit flow from v to w is induced. Hence, the effective resistance between v 
and w is r^j^tu — (5„ — 5^)^ IS.'^ {5^ — 5^), where 5^ is the Dirac delta function. 

A massively useful characterization of effective resistance is the random walk interpretation. Let Xt be the location 
of a random walker on G at time t. The hitting time H{v, w) is then 

H{v, w) = E[min{t > Q : Xt = w}\Xq = v] 

We find that the effective resistance is related to the hitting time by, 

H{v, w) + H{w, v) 



The numerator is also known as the commute time. As we will see, this characterization of effective resistance is 
useful when bounding it for specific graph models. 



4.2 UST Wavelet Detector 



In our framework, we are given the opportunity to evaluate our test according to our random algorithm. We will 
now examine the performance of the spanning tree wavelet detector, when the spanning tree is drawn according to a 
UST. First, we will explore the construction of the UST and examine key properties. The UST is a random spanning 
tree, chosen uniformly at random from the set of all distinct spanning trees. The foundational Matrix-Tree theorem 
iKirchhoff I [184711 des cribes the probab i Utv of an edge being included in the UST. The following lemma can be found 



Lovaszi lil993l and lLyons and PeresI ll2000ll 



Lemma 6. Let G be a graph and T a draw from U ST{G). 

P{e e T} = re 

Hence, we can expect that for a given cut in the graph, that the cut size in the tree will look like the sum of edge 
effective resistances. While it is infeasible to enumerate all spanning tr ees of a graph, the Aldous-Broder algorithm is 
an efficient method for generating a draw from U ST (G) \Mdou^il99(i . The algorithm simulates a random walk on G, 
{Xt}, stops when ah of the vertices have been visited, and defines the spanning tree T by the edges {{Xij(^Xa,v)-i,v) : 
veV}. 

In order to control || V7-xj|o, we need to control the overlap between a cut and the UST. Clearly the UST does not 
independently sample edges, but it does have the well documented property of negative association, that the inclusion 
of an edge decreases the probability that another edge is included. The following lemma sta tes a concentration result 
for the UST, based on negative associ ation, and can be fo und in iFung and Harvey II2OIOI1 . The proof is a simple 
extension of the concentration results in Gandhi et al. I 1I2QO6II . 
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Lemma 7. Let B a E be a fixed subset of edges, and \T D B\ denote the number of edges in T also in B. 



5 \EeeB' 



P{|rni3|>(l + 5)^rJ< -— - 

eeB \\ ^ ) 

We use this result to give conditions under which the UST wavelet detector asymptotically distinguishes Hq from 

Hi. 

Theorem 8. Let 7'i„ax ~ niaxxeA^ Sees»pp(Vx) (the maximum effective resistance of a cut in X). If 

^ = cj (^Vr,„ax logdlognj 

then Hq and Hi are asymptotically distinguished by the test statistic ||By||oo where B is the UST wavelet basis. 

Proof. Let = J^eeB for B C E. By some basic calculus, and the fact that log(l + x) > x/{l + x/2), we see 
that 

Rewriting the Lemma|7] we obtain with probability > 1 — 7 



|r n S| < TB + \ 2rB log - + i(log i)2 + 1 log - < (tb + \ 2rB log - + log ^ 
Now, because ||V7-x||o = |Tn i3| for B = supp(V7-x), we know by Theorem|4]if 



^ = cj (^VB + y 2rB log i + log i j log dlog n 

then Hq and Hi are asymptotically distinguished and the result follows because we guarantee this uniformly for all 
such B. □ 



5 Specific Grapli Models 

In this section we study our detection problem for several different families of graphs. Specifically, we control the 
effective resistance re for each graph family, which when combined with Theorem |8]gives a lower bound on the SNR 
for which ||By||oo asymptotically distinguishes Hq and Hi. 

In Theorem|8] we showed that the consistency regime depends on the effective resistances of the cuts induced by 
the class of signals X. On its own, it is not immediately clear that this result is an improvement over the bound in 
Remark |5] that we would obtain from any spanning tree. However, Foster's theorem highlights why we expect the 
effective resistance to be less than the cut size. 

Theorem 9 (Foster's Theorem iFosted 1119491 . lTetaUllll99lh . 



re = n- l 

e£E{G) 

Hence, if we select an edge uniformly at random from the graph, we expect its effective resistance to be [n — 
l)/m w (the reciprocal of the average degree) where m = Indeed, in several example graphs we can 

formalize this intuition and give an improvement over Remark|5] 

We complement these results with two types of simulations verifying different aspects of our theory. The first 
verifies the upper bound in Lemma |3]for a variety of graph models by plotting ||Bx||o versus p\og{d) log(n) for 
several randomly generated signals. These plots (see Figure [T]i demonstrate the validity of our bound since in all 
cases ||Bx||o < p\og{d) log(77,), but, more importantly, the readily-observable linear relationship between these two 
quantities suggests that one should not expect an improvement on this bound by more than a constant factor. 

The second simulation verifies the performance of our spanning tree wavelets detector on various graph models. 
In Figure|2] we plot the power of our test statistic (with Type I error fixed at 5%) as a function of signal strength p for 
several values of n, where we allow p to scale with n to ensure a non-empty X. These simulations demonstrate that as 
expected for sufficiently large signal strength, our statistic can separate Hq from Hi. More importantly, the threshold 
signal strength for which detection is possible increases with n and p, as predicted by our theory. 
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Figure 1 : Spanning tree wavelet basis sparsity as a function of p log d log n for, from left to right, 2-dimensional torus, 
complete, fc-NN, and e graphs. Linear fits have slopes: 0.10, 0.0021, 0.010, 0.0059 and coefficients: 0.88 0.72, 
0.76, 0.71 respectively. 
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Figure 2: Power as a function of signal strength for different values of n for 2-dimensional torus, complete, fc-NN, and 
e graphs, p scales like ^Jn, n, r?!'^ and n^l^ respectively. 



5.1 Edge Transitive Graphs 

An edge transitive graph, G, is one such that for any edges cq, ei, there is a graph automorphism that maps cq to e\. 
Examples of edge transitive graphs include the /-dimensional torus and the complete graph i^„. For such a graph, 
every edge has the same effective resistance, and Foster's Theorem then shows that r^ = (n — l)/m where m is the 
number of edges. Moreover since edge transitive graphs must be d-regular for some degree d, we see that m = Q{nd) 
so the Te = 8(l/(i). This leads us to the following corollary, which we note matches the lower bound in Theorem [T] 
modulo logarithmic terms if p/d < ^/n: 

Corollary 10. Let G be edge transitive with common degree d. Then for each edge e G E{G), ^ {n — l)/m. 
Consider the hypothesis testing problem ([T]| where the set X is parameterized by p. If: 



Then the UST wavelet detector, \ |By| |oo, asymptotically distinguishes Hq and Hi. 



5.2 kNN Graphs 



Oftentimes in applications, the graph topology is derived from data. In this case, the randomness of the data means that 
the graph itself is inherently random. Commonly, these graphs are modeled as random geometric graphs, and in this 
section we will devote our attention to the symmetric k-nearest neighbor graphs. Specifically, suppose that zi , . . . , z„ 
are drawn i.i.d. from a density p supported over M.'^. Then we form the graph G over [n] by connecting vertices i, j if 
Zi is amongst the /c-nearest neighbors of Zj or vice ver sa. Some regularity conditions of p are needed for our results 
to hold; they can be found in lVon Luxburg et al. 1 1201(1 



To bound the effective resistance r^. Corollary 9 in 
the definition of we see that r 



1 1] 



^ i- + i- < i since 



Von Luxburg et al. I 2010ll shows that Hij/2m — ?> l/dj and by 



> k for each i. A formal analysis leads to the following 



corollary, which we prove in Appendix iBl with more precise concentration arguments: 

Corollary 11. Let G b e a k-NN graph wi t h k/n and k[k/n)'^/''' — > oo and where the density p satisfies the 
regularity conditions in Von Luxburs et al. I 201(i] . Consider the hypothesis testing problem ([T]i where the set X is 



1 



parameterized by p. If: 



- = uj{^J p/k\ogd\ogn) 
a 

Then the UST wavelet detector, \ |By| |oo, asymptotically distinguishes Hq and Hi. 

5.3 e-Graphs 

The e-graph is another widely used random geometric graph in machine learning and statistics. As with the fc-NN 
graph, the vertices are embedded into M'' and edges are a dded between pairs of ver tices that are within distance e 
of each other As with the fc-NN graph. Corollary 8 from Luxburg et al. 1 II2OIOI shows that Hij — s- m/dj for 



each pair of vertices. This leads us to believe that rij — > + ^/{dj)- If the density p from which we draw 

data points is bounded from below by some constant, then we can uniformly lower bound all of the degrees di using 
fairly elementary concentration results, which results in an upper bound on . Formalizing this intuition, we have the 
following corollary, which we prove in Appendix IbI 

Corollary 1 2. Let G be a e- graph with points Xi , . . . Xn drawn from a density p, which satisfies the regularity 



conditions in Won Luxburg et a l. 12010] and is lower bounded by some constant pYain (independent of n). Let e 



0, ne"^^^ — > 00 and consider the hypothesis testing problem ([T]) where the set X is parameterized by p. If: 



~ = w(\/-^logrflogn) 
cr V JT-e 



Then ||By||oo asymptotically distinguishes Hq and Hi. 



6 Discussion 

We studied the detection of piece-wise constant activation patterns over graphs, and provided a necessary condition 
for the asymptotic distinguishability of signals that are assumed to have few discontinuities. We gave a novel spanning 
tree wavelet construction, that is the extension of the Haar wavelet basis, for arbitrary graphs. While it achieves 
strong theoretical performance for detection, the spanning tree wavelet construction could be of independent interest 
for compression and denoising. The uniform spanning tree wavelet detector was shown to have strong theoretical 
guarantees that in many cases gives us near optimal performance. This means that under adversarial choice of signal, 
our randomized algorithm asymptotically distinguishes Hq from Hi . Alternatively, this means that for any given signal 
(non-adversarial setting) that the vast majority of spanning trees induce detectors that asymptotically distinguish Hq 
from Hi for low signal-to-noise ratios. 
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A Proofs for Section |3] 



A.l Proof of Lemma |3] 

Before we proceed with the proof, we state and prove two results on the performance of the algorithm: 

Lemma 13. Let T be a tree. FindBalance returns a vertex v such that the largest connected component ofT\v is of 
size at most [|T|/2] in 0{\T\) time. 

Proof. Let the objective be the size of the largest connected components of T\v. Every move in FindBalance reduces 
the objective by at least 1 and the objective can be at most |T| — 1 so it must terminate in less than |T| moves. Now 
at any step of FindBalance, if the objective is greater than [|T|/2], the cumulative size of the remaining connected 
components is less than [|T|/2J. Hence, in the next step the connected component formed by these is less than 
[|T|/2] . Thus, the program cannot terminate at a move directly after the objective is greater than [|T|/2] . □ 

We will also require the following claim. Indeed, controlling the depth of the recursion in the wavelet construction 
is the sine qua non for controlling the sparsity, ||Bx||o. 

Claim 14. The wavelet construction has recursion depth at most [log d~\ [log n] . 

Proof. Whenever FormWavelet is applied it increases the number of activated height of the dendrogram by at most 
[log d\ . By lemma[T3]the size of the remaining components is halved, so the algorithm terminates in at most [log ri\ 
steps. □ 

Proof of Lemma\3\ We will show that any edge e e T is activated by at most [log d] [log n] basis elements in B, and 
this will imply the result. We will say that an edge e is activated by a basis element b if e C sitpp(Vrb). It follows 
that for a basis element b, if b-^x ^ then 3e that is activated by b. Let activations (e) be the number of basis elements 
that activates e (activations(e) = if e ^ suppiV t^))- We then have 

||Bx||o < ^ activations(e) 

e^supp['V -fx.) 

Consider some edge e. If e is activated by some subtree Tsub (we use this interchangeably with being activated 
by the basis element formed by partitioning Tsub into two groups), then it can be activated by at most one of Tsub^ 
subtrees. This implies that activations (e) is upper bounded by the depth of the recursion. By the claim, we find that, 

I |Bx| lo < Yl riog [log n\ < 1 1 Vrx| |o [log d^ [log n^ 

e^supp{'^ fx) 

Proving the first claim. The second claim is obvious from the fact that T contains a subset of the edges in Q, so every 
cut has larger cut size in Q than it does in T. □ 



A.2 Proof of Theorem H 

Proof. Under the null x = 0, and we have that 

||By||oo = ||Be||oo < a \/ 2\og{n / 6) 



with probability at least 1 — 5. So, as long as r = a^2\og{n/5) then we control the probability of false alarm (type 
1 error). For a element x of the alternative, let the index, i* , achieve the maximum of Bx (i.e. ||Bx||txj = |Bx|i*). 
Then |By|i. > |Ba;|i» — ayj2\og{l/5) with probability at least 1 — 5 and 

IR-r|2 -IIRyI|2 E,:(Bx).#o(Bx)f ||x||2 
\iix\,, - ||BX||^ > -— - Mp^ll 

||-tsx||o l|t>x||o 

Taking square roots and combining this with Lemma |3] 

IWI2 



IBxIU > 



VI|Vrx||o[logdl[lognl 
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from which we have the result, 



|By| 



> 



m\2 



VllVrxlloriogdiriognl 
Forcing this lower bound to be greater than r gives us our result. 



V21og(l/<5) 



□ 



B Proofs For Section |5] 
B.l Proof of Corollary M 

First we restate Corollary 9 from lVon Luxburg et al. [ 2010 1: 



Corollary 15. Consider an unweighted symmetric or mutual k-NN graph built from a sequence Xi, . . . , Xn drawn 
i.i.d. from a density p. Then there exists constants ci, C2, C3 such that with probability at least 1 — cin exp(— fcc2) we 
have uniformly for all i ^ j that: 



2m 



-Hi 



k 



< C3 



Proof of Corollary\TT] We focus on the symmetric fc-NN graph in which we connect Vi to Vj if Vi is in the fc-nearest 
neighbors of Vj or vice versa. In this graph, every node has degree > k which will be crucial in our analysis. Our 
goal is to bound the effective resistance of every edge, so that we can subsequently bound Vmax and apply Corollary |8] 
From the definition of we have: 



< 



< 











m 


m J 


2c3 




1 


1^2+2/ d 


dt 


2c3 


^2/d 


2 


^,2+2/d 


+ k 



Where the first Une is the definition of , the second line follows from Corollary [15] and the last line follows from the 
fact that di > k for each vertex. Since kik/nY/'^ 00, we see that = 0{^). Moreover, with this scaling of fc, 
that the probability in Corollary [T5]is going to 1. We can therefore bound rmax as: 



< P 2C3 



= o 



^ J^2+2/d ' 

Since the first term is going to zero with n. Plugging in this bound on Vmax into Theorem|8]gives the result. 

B.2 Proof of Corollary M 

As before, we first state Corollary 8 from lVon Luxburg et al. I teOlOll : 



□ 



Corollary 16. Consider an unweighted e-graph built from the sequence Xi, . . . , X„ drawn i.i.d. from the density p 
.Then there exists constants ci, . . . C5 > such that with probability at least 1— cincxp(— C2ne'')— C3 cxp(— C4ne'')/e'^, 
we have uniformly for all i ^ j that: 



2m 



< 



C5 



Proof of CorollarvUl] Some manipulation of the result in Corollary [T6]reveals that: 

2m 2c5m 



d, 



-,2,2d+2 



11 



Under our scaling, the second term goes to zero and the probability in Corollary [16] goes to one, so Hij = 0{m/dj). 
We will now give a lower bound on dj. If Xi is in the ball of radius e centered at Xj, th en we connect Xj and X^ . 
Thus dj is exactly the number of vertices in the B{Xj; e). The regularity condition on p in I Von Luxburg etal] II2OIOI1 
requires that there exists constants a and eq such that for all e < eo and for all x e supp(p), vol(_B(a;; e) n supp(p)) > 
Q!Vol(_B(a;; e)). By this fact, the fact that the density is lower bounded by Pmin, and by the fact that e — > 0, we know 
that for sufficiently large n, p{B{Xj\ e)) > Pminacde'' where Cdf-'^ is the volume of a d-dimensional ball of radius 
e. The probability that Xi e B{Xj \ e) is distributed as a Bernoulli random variable with mean > apminCd^'^ ■ By 
Hoeffding's inequality and a union bound we get that: 

dj > napminCd.e'' + \J n log(n) = rt{ne'^) 

for all vertices j with probability > 1 — Using the definition of j along with the bound on Hij and dj we have 
that uniformly for all pairs 

Plugging in this bound into Theorem [8] gives us the result. □ 
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